We present a novel dataset and a novel algorithm for recognizing activitiesof daily living (ADL) from a first-person wearable camera. Handled objects arecrucially important for egocentric ADL recognition. For specific examination ofobjects related to users' actions separately from other objects in anenvironment, many previous works have addressed the detection of handledobjects in images captured from head-mounted and chest-mounted cameras.Nevertheless, detecting handled objects is not always easy because they tend toappear small in images. They can be occluded by a user's body. As describedherein, we mount a camera on a user's wrist. A wrist-mounted camera can capturehandled objects at a large scale, and thus it enables us to skip objectdetection process. To compare a wrist-mounted camera and a head-mounted camera,we also develop a novel and publicly available dataset that includes videos andannotations of daily activities captured simultaneously by both cameras.Additionally, we propose a discriminative video representation that retainsspatial and temporal information after encoding frame descriptors extracted byConvolutional Neural Networks (CNN).
展开▼